QurAna: Corpus of the Quran annotated with Pronominal Anaphora
نویسندگان
چکیده
This paper presents QurAna: a large corpus created from the original Quranic text, where personal pronouns are tagged with their antecedence. These antecedents are maintained as an ontological list of concepts, which has proved helpful for information retrieval tasks. QurAna is characterized by: (a) comparatively large number of pronouns tagged with antecedent information (over 24,500 pronouns), and (b) maintenance of an ontological concept list out of these antecedents. We have shown useful applications of this corpus. This corpus is the first of its kind covering Classical Arabic text, and could be used for interesting applications for Modern Standard Arabic as well. This corpus will enable researchers to obtain empirical patterns and rules to build new anaphora resolution approaches. Also, this corpus can be used to train, optimize and evaluate existing approaches.
منابع مشابه
A Hybrid Approach to Pronominal Anaphora Resolution in Arabic
Corresponding Author: Abdullatif Abolohom Department of Computer Science, Faculty of Information Science and Technology, University Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia Email: [email protected] Abstract: One of the challenges in natural language processing is to determine which pronouns to be referred to their intended referents in the discourse. Performing anaphora resolution ...
متن کاملZero Pronominal Anaphora Resolution for the Romanian Language
This paper presents a new study on the distribution, identification, and resolution of zero pronouns in Romanian. A Romanian corpus, including legal, encyclopaedic, literary, and news texts has been created and manually annotated for zero pronouns. Using a morphological parser for Romanian and machine learning methods, experiments were performed on the created corpus for the identification and ...
متن کاملThe DAD Parallel Corpora and their Uses
This paper deals with the uses of the annotations of third person singular neuter pronouns in the DAD parallel and comparable corpora of Danish and Italian texts and spoken data. The annotations contain information about the functions of these pronouns and their uses as abstract anaphora. Abstract anaphora have constructions such as verbal phrases, clauses and discourse segments as antecedents ...
متن کاملWhere Anaphora and Coreference Meet. Annotation in the Spanish CESS-ECE Corpus
This paper describes the guidelines of the annotation scheme designed to enrich the Spanish CESS-ECE corpus with coreference information, which is a significant step towards the definition of an exhaustive typology of pronominal and full NP coreferential expressions and their relations for Spanish. The goal is twofold. From a computational perspective, this work establishes the formal foundatio...
متن کاملPronominal Reference Type Identification and Event Anaphora Resolution for Hindi
In this paper, we present hybrid approaches for pronominal reference type (abstract or concrete) identification and event anaphora resolution for Hindi. Pronominal reference type identification is one of the important parts for any anaphora resolution system as it helps anaphora resolver in optimal feature selection based on pronominal reference types. We use language specific rules and feature...
متن کامل